
The text of all pending claims, (including withdrawn claims) is set forth below. Cancelled 
and not entered claims are indicated with claim number and status only. The claims as listed 
below show added text with underlining and deleted text with str i k e through . The status of each 
claim is indicated with one of (original), (currently amended), (cancelled), (withdrawn), (new), 
(previously presented), or (not entered). 

1 . (currently amended) A recording medium for storing a program executable by an 
information apparatus for implementing a parallel matrix processing method in matrix 
processing, which includes LU factorization, and is carried out by a shared-memory scalar 
parallel-processing computer having a plurality of processor modules, secondary caches 
corresponding respectively to the processor modules, primary caches respectively included in 
the processor modules, an interconnection network connecting the processor modules via the 
secondary caches, and a plurality of memory modules which the processor modules can access 
via the interconnection network, said method comprising: 

dividing a matrix into small matrix blocks consisting of a diagonal block defined to align 
along the diagonal part, a block that is leg -long in the row direction and positioned contiguously 
to the diagonal block, a block that is long in the column direction and positioned contiguously to 
the diagonal block and a square block; 

storing the diagonal blocks and small matrix blocks obtained by equally dividing the block 
that is long in the column direction and positioned contiguously below the diagonal block in local 
memories of said processor modules; 

processing, in parallel by said processor modules, the blocks stored respectively in the 
local memories of said processing modules together with the diagonal block and the block that is 
long in the row direction or column direction so as to be the same with the calculation results of 
corresponding block parts when a square matrix is LU factorized; and 

updating the square block by deducting from the square block the products between the 
block that is long in the row direction and the block that is long in the column direction using 
results of processing of said small matrix blocks obtained at said processing step. 

3. (previously presented) A recording medium according to claim 1 , said method 
further comprising: 



extracting pivof candidates, each of which represents a matrix element associated with 
the largest value of the concerned matrix, from data of said small matrix blocks processed by 
said processor modules; and 

determining a final one of said pivots with a maximum data value among said candidates 
in a memory area common to said processor modules, and 

wherein said LU factorization is carried out by using said determined pivot. 

4. (previously presented) A recording medium according to Claim 2 wherein said 
LU factorization of said entire matrix is completed by execution of the method comprising: 

sequentially updating portions of said matrix starting with one on the outer side of said 
matrix in accordance with a recursive algorithm; and 

eventually applying said LU factorization by using one processor module to a portion that 
remains to be updated inside said matrix. 

5. (previously presented) A recording medium according to Claim 1 wherein said 
matrix processing is Cholesky factorization or a modified version of said Cholesky factorization 
applied to said matrix. 

6. (previously presented) A recording medium according to Claim 5 wherein said 
Cholesky factorization or said modified version of said Cholesky factorization is carried out to 
complete said LU factorization of said entire matrix by execution of the method comprising: 

sequentially updating portions of said matrix starting with one on the outer side of said 
matrix in accordance with a recursive algorithm; and 

eventually applying said LU factorization by using one processor module to a portion that 
remains to be updated inside said matrix. 

7. (previously presented) A recording medium according to Claim 5 wherein, at said 
updating step, 

a triangular matrix portion of each of said small matrix block to be updated is divided into 
2 x N fine blocks wherein the symbol N denotes the number of processor modules; and 

said fine blocks are assembled to form N pairs each stored in a local memory area of 
one of said processor modules to be processed by said processor module. 



8. (currently amended) A parallel matrix processing method applied to matrix 
processing, which includes LU factorization, and is carried out by a shared-memory scalar 
parallel-processing computer having a plurality of processor modules, secondary caches 
corresponding respectively to the processor modules, primary caches respectively included in 
the processor modules, an interconnection network connecting the processor modules via the 
secondary caches, and a plurality of memory modules which the processor modules can access 
via the interconnection network, said parallel matrix processing method comprising: 

dividing a matrix into small matrix blocks consisting of a diagonal block defined to align 
along the diagonal part, a block that is leg -long in the row direction and positioned contiguously 
to the diagonal block, a block that is long in the column direction and positioned contiguously to 
the diagonal block and a square block; 

storing the diagonal blocks and small matrix blocks obtained by equally dividing the block 
that is long in the column direction and positioned contiguously below the diagonal block in local 
memories of said processor modules; 

processing, in parallel by said processor modules, the blocks stores respectively in the 
local memories of said processing modules together with the diagonal block and the block that is 
long in the row direction or column direction so as to be the same with the calculation results of 
corresponding block parts when a square matrix is LU factorized; and 

updating the square block by deducting from the square block the products between the 
block that is long in the row direction and the block that is long in the column direction using 
results of processing of said small matrix blocks obtained at said processing step. 

9. (currently amended) A shared-memory scalar parallel-processing computer 
having a plurality of processor modules, said shared-memory scalar parallel-processing 
computer comprising: 

a blocking unit dividing a matrix into small matrix blocks consisting of a diagonal block 
defined to align along the diagonal part, a block that is teg -long in the row direction and 
positioned contiguously to the diagonal block, a block that is long in the column direction and 
positioned contiguously to the diagonal block and a square block; 

a storage unit storing diagonal blocks and small matrix blocks obtained by equally 
dividing the block that is long in the column direction and positioned contiguously below the 
diagonal block in local memories of said processor modules; 

a processing unit in parallel by said processor modules, the blocks stores respectively in 
the local memories of said processing modules together with the diagonal block and the block 



that is long in the row direction or column direction so as to be the same with the calculation 
results of corresponding block parts when a square matrix is LU factorized; and 

an updating unit updating the square block by deducting from the square block the 
products between the block that is long in the row direction and the block that is long in the 
column direction using results of processing of said small matrix blocks produced by said 
processing means. 



